leveldb: introduce trivial version finalization #264
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces a bypass for quick version finalization.
For more context, checkout go-ethereum issue
We use leveldb as the storage engine in go-ethereum project. While people always complain about the long compaction pause for archive node(this type of node can have more than 1TB data now).
After some investigations, I found during the long compaction pause, almost I/O is idle while one CPU core is always full. And also from the pperf information from @karalabe, we can notice most of the time is spent on the bytes compare.
Finally I realize when the database size grow, the number of files per level also grow. In the go-ethereum project, we use default db setting now. It means for an archive node, it can have more than 500,000 sstable files.
After a compaction, leveldb will generate a new version by merging old version and change set.
And during the version generation, current code will apply qsort for every level, even most of them are unchanged. When the amount of data in the database increases, the number of files per layer also increases rapidly, so the overhead of qsort is very large.
The idea of this PR is:
Because the new files generated by compaction are strictly ordered, and these new files will not have any overlap with other files of source+1 layer, so here we can use binary search to find the new file inserted index, then insert directly
This type of trivial version finalization is not suitable following events:
Since in these events, we cannot guarantee that the newly inserted file in the layer must not overlap with other files.